FlashProfile: Interactive Synthesis of Syntactic Profiles

نویسندگان

  • Saswat Padhi
  • Prateek Jain
  • Daniel Perelman
  • Oleksandr Polozov
  • Sumit Gulwani
  • Todd D. Millstein
چکیده

We address the problem of learning comprehensive syntactic profiles for a set of strings. Real-world datasets, typically curated from multiple sources, often contain data in various formats. Thus any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify various formats is infeasible in standard big-data scenarios. We present a technique for generating comprehensive syntactic profiles in terms of user-defined patterns that also allows for interactive refinement. We define a syntactic profile as a set of succinct patterns that describe the entire dataset. Our approach efficiently learns such profiles, and allows refinement by exposing a desired number of patterns. Our implementation, FlashProfile, shows a median profiling time of 0.7 s over 142 tasks on 74 real datasets. We also show that access to the generated data profiles allow for more accurate synthesis of programs, using fewer examples in programming-by-example workflows.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Paraphrasing Based on Linguistic Annotation

We propose a method “Interactive Paraphrasing” which enables users to interactively paraphrase words in a document by their definitions, making use of syntactic annotation and word sense annotation. Syntactic annotation is used for managing smooth integration of word sense definitions into the original document, and word sense annotation for retrieving the correct word sense definition for a wo...

متن کامل

PRISMA: un modelo interactivo de Síntesis de Información

In this paper, we describe an information synthesis interactive model (PRISMA). The user interacts with the system by means of automatically extracted key concepts lists. The model uses syntactic knowledge to identify key concepts, to organize and display the information pieces. Also, we propose, and therefore we put into practice, a corpora based methodology of interactive models evaluation. T...

متن کامل

رشد جنبه معنایی فعل در کودک فارسی‌زبان: مطالعه طولی

Objective Learning “verb” as one of the main components of sentence, has been always a debatable topics in the process of language learning. One of the important issues in “verb” learning is determining its meaning using syntactic clues and learning its semantic aspects. Therefore, the main objective of this study was to examine the development of the semantic aspect of ...

متن کامل

5-sulfosalicylic acid as an efficient organocatalyst for environmentally benign synthesis of 2-substituted benzimidazoles

A water soluble, Bronsted acid, 5-sulfosalicylic acid as an efficient organocatalyst was used for the synthesis of physiologically active 2-substituted benzimidazole derivatives from o-phenylenediamine and aromatic aldehydes in ethanol at reflux condition. Cost-effectiveness, use of non-hazardous solvents, metal free and commercially available catalyst, single-step, environmentally fri...

متن کامل

Mimicry of Tone Production: Results from a Pilot Experiment

In this paper we present the description and the first results of a pilot experiment in which participants were requested to mimic the production of sonic elements trough different control modalities. Results show different degrees of dependence of the control temporal profiles with the dynamic level and temporal ordering of the stimuli. The protocol and methodology here advanced may turn usefu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.05725  شماره 

صفحات  -

تاریخ انتشار 2017